Picture for Joon Son Chung

Joon Son Chung

Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization

Add code
May 08, 2025
Viaarxiv icon

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

Add code
Apr 29, 2025
Viaarxiv icon

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Add code
Apr 03, 2025
Viaarxiv icon

Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes

Add code
Mar 24, 2025
Viaarxiv icon

Deep Understanding of Sign Language for Sign to Subtitle Alignment

Add code
Mar 05, 2025
Viaarxiv icon

LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport

Add code
Jan 16, 2025
Figure 1 for LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport
Figure 2 for LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport
Figure 3 for LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport
Figure 4 for LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport
Viaarxiv icon

AdaptVC: High Quality Voice Conversion with Adaptive Learning

Add code
Jan 07, 2025
Figure 1 for AdaptVC: High Quality Voice Conversion with Adaptive Learning
Figure 2 for AdaptVC: High Quality Voice Conversion with Adaptive Learning
Figure 3 for AdaptVC: High Quality Voice Conversion with Adaptive Learning
Figure 4 for AdaptVC: High Quality Voice Conversion with Adaptive Learning
Viaarxiv icon

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation

Add code
Dec 28, 2024
Figure 1 for CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Figure 2 for CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Figure 3 for CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Figure 4 for CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Viaarxiv icon

VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis

Add code
Dec 26, 2024
Viaarxiv icon

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow

Add code
Nov 29, 2024
Viaarxiv icon